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SINGLE-CASE DESIGNS TECHNICAL DOCUMENTATION 



In an effort to expand the pool of scientific evidence available for review, the What Works 
Clearinghouse (WWC) assembled a panel of national experts in single-case design (SCD) and 
analysis to draft SCD Standards. In this paper, the panel provides an overview of SCDs, specifies 
the types of questions that SCDs are designed to answer, and discusses the internal validity of 
SCDs. The panel then proposes SCD Standards to be implemented by the WWC. The Standards 
are bifurcated into Design and Evidence Standards (see Figure 1). The Design Standards evaluate 
the internal validity of the design. Reviewers assign the categories of Meets Standards, Meets 
Standards with Reservations and Does not Meet Standards to each study based on the Design 
Standards. Reviewers trained in visual analysis will then apply the Evidence Standards to studies 
that meet standards (with or without reservations), resulting in the categorization of each 
outcome variable as demonstrating Strong Evidence, Moderate Evidence, or No Evidence. 



A. OVERVIEW OF SINGLE-CASE DESIGNS 

SCDs are adaptations of interrupted time-series designs and can provide a rigorous 
experimental evaluation of intervention effects (Homer & Spaulding, in press; Kazdin, 1982, in 
press; Kratochwill, 1978; Kratochwill & Levin, 1992; Shadish, Cook, & Campbell, 2002). 
Although the basic SCD has many variations, these designs often involve repeated, systematic 
measurement of a dependent variable before, during, and after the active manipulation of an 
independent variable (e.g., applying an intervention). SCDs can provide a strong basis for 
establishing causal inference, and these designs are widely used in applied and clinical 
disciplines in psychology and education, such as school psychology and the field of special 
education. 

SCDs are identified by the following features: 



• An individual “case” is the unit of intervention and unit of data analysis (Kratochwill 
& Levin, in press). A case may be a single participant or a cluster of participants (e.g., 
a classroom or a community). 

• Within the design, the case provides its own control for purposes of comparison. For 
example, the case’s series of outcome variables are measured prior to the intervention 
and compared with measurements taken during (and after) the intervention. 

• The outcome variable is measured repeatedly within and across different conditions 
or levels of the independent variable. These different conditions are referred to as 
phases (e.g., baseline phase, intervention phase). 



As experimental designs, a central goal of SCDs is to detennine whether a causal relation 
(i.e., functional relation) exists between the introduction of a researcher-manipulated 
independent variable (i.e., an intervention) and change in a dependent (i.e., outcome) variable 
(Horner & Spaulding, in press; Levin, O’Donnell, & Kratochwill, 2003). Experimental control 
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involves replication of the intervention in the experiment and this replication is addressed with 
one of the following methods (Horner, et al, 2005): 



• Introduction and withdrawal (i.e., reversal) of the independent variable (e.g., ABAB 
design) 

• Iterative manipulation of the independent variable across different observational 
phases (e.g., alternating treatments design) 

• Staggered introduction of the independent variable across different points in time 
(e.g., multiple baseline design) 



SCDs have many variants. Although flexible and adaptive, a SCD is shaped by its research 
question(s) and objective(s) which must be defined with precision, taking into consideration the 
specifics of the independent variable tailored to the case(s), setting(s), and the desired 
outcome(s) (i.e., a primary dependent variable). For example, if the dependent variable is 
unlikely to be reversed after responding to the initial intervention, then an ABAB reversal design 
would not be appropriate, whereas a multiple baseline design across cases would be appropriate. 
Therefore, the research question generally drives the selection of an appropriate SCD. 



B. CAUSAL QUESTIONS THAT SCDS ARE DESIGNED TO ANSWER 

The goal of a SCD is usually to answer “Is this intervention more effective than the current 
“baseline” or “business-as-usual” condition?” SCDs are particularly appropriate for 
understanding the responses of one or more cases to an intervention under specific conditions 
(Horner & Spaulding, in press). SCDs are implemented when pursuing the following research 
objectives (Horner et al., 2005): 



• Determining whether a causal relation exists between the introduction of an 
independent variable and a change in the dependent variable. For example, a research 
question might be “Does Intervention B reduce a problem behavior for this case (or 
these cases)?” 

• Evaluating the effect of altering a component of a multi-component independent 
variable on a dependent variable. For example, a research question might be “Does 
adding Intervention C to Intervention B further reduce a problem behavior for this 
case (or these cases)?” 

• Evaluating the relative effects of two or more independent variables (e.g., alternating 
treatments) on a dependent variable. For example, a research question might be “Is 
Intervention B or Intervention C more effective in reducing a problem behavior for 
this case (or these cases)?” 
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SCDs are especially appropriate for pursuing research questions in applied and clinical 
fields. This application is largely because disorders with low prevalence may be difficult to study 
with traditional group designs that require a large number of participants for adequate statistical 
power (Odom, et ah, 2005). Further, in group designs, the particulars of who responded to an 
intervention under which conditions might be obscured when reporting only group means and 
associated effect sizes (Homer et al. 2005). SCDs afford the researcher an opportunity to provide 
detailed documentation of the characteristics of those cases that did respond to an intervention 
and those that did not (i.e., nonresponders). For this reason, the panel recommends that What 
Works Clearinghouse (WWC) reviewers systematically specify the conditions under which an 
intervention is and is not effective for cases being considered, if this information is available in 
the research report. 

Because the underlying goal of SCDs is most often to determine “Which intervention is 
effective for this case (or these cases)?” the designs are intentionally flexible and adaptive. For 
example, if a participant is not responding to an intervention, then the independent variables can 
be manipulated while continuing to assess the dependent variable (Homer et ah, 2005). Because 
of the adaptive nature of SCD designs, nonresponders might ultimately be considered 
“responders” under particular conditions. 1 In this regard, SCDs provide a window into the 
process of participant change. SCDs can also be flexible in terms of lengthening the number of 
data points collected during a phase to promote a stable set of observations, and this feature may 
provide additional insight into participant change. 



C. THREATS TO INTERNAL VALIDITY IN SINGLE-CASE DESIGN 2 

Similar to group randomized controlled trial designs, SCDs are structured to address major 
threats to internal validity in the experiment. Internal validity in SCDs can be improved through 
replication and/or randomization (Kratochwill & Levin, in press). Although it is possible to use 
randomization in structuring experimental SCDs, these applications are still rare. Unlike most 
randomized controlled trial group intervention designs, most single-case researchers have 
addressed internal validity concerns through the structure of the design and systematic 
replication of the effect within the course of the experiment (e.g., Hersen & Barlow, 1976; 
Homer et ah, 2005; Kazdin, 1982; Kratochwill, 1978; Kratochwill & Levin, 1992). The former 
(design structure, discussed in the Standards as “Criteria for Designs...”) can be referred to as 
“methodological soundness” and the latter (effect replication, discussed in the Standards as 
“Criteria for Demonstrating Evidence...”) is a part of what can be called “evidence credibility” 
(see, for example, Kratochwill & Levin, in press). 



1 WWC Principal Investigators (Pis) will need to consider whether variants of interventions constitute distinct 
interventions. Distinct interventions will be evaluated individually with the SCD Standards. For example, if the 
independent variable is changed during the course of the study, then the researcher must begin the replication series 
again to meet the design standards. 

2 Prepared by Thomas Kratochwill with input from Joel Levin, Robert Horner, and William Shadish. 
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In SCD research, effect replication is an important mechanism for controlling threats to 
internal validity and its role is central for each of the various threats discussed below. In fact, the 
replication criterion discussed by Homer et al. (2005, p. 168) represents a fundamental 
characteristic of SCDs: “In most [instances] experimental control is demonstrated when the 
design documents three demonstrations of the experimental effect at three different points in 
time with a single case (within-case replication), or across different cases (inter-case replication) 
(emphasis added).” As these authors note, an experimental effect is demonstrated when the 
predicted changes in the dependent measures covary with manipulation of the independent 
variable. This criterion of three replications has been included in the Standards for designs to 
“meet evidence” standards. Currently, there is no formal basis for the “three demonstrations” 
recommendation; rather, it represents a conceptual norm in published articles, research, and 
textbooks that recommend methodological standards for single-case experimental designs 
(Kratochwill & Levin, in press). 

Important to note are the terms level, trend and variability. “Level” refers to the mean score 
for the data within a phase. “Trend” refers to the slope of the best-fitting straight line for the data 
within a phase, and “variability” refers to the fluctuation of the data (as reflected by the data’s 
range or standard deviation) around the mean. See pages 17-20 for greater detail. 

Table 1, adapted from Hayes (1981) but without including the original “design type” 
designations, presents the three major types of SCDs and their variations. In AB designs, a case’s 
performance is measured within each condition of the investigation and compared between or 
among conditions. In the most basic two-phase AB design, the A condition is a baseline or 
preintervention series/phase and the B condition is an intervention series/phase. It is difficult to 
draw valid causal inferences from traditional two-phase AB designs because the lack of 
replication in such designs makes it more difficult to rule out alternative explanations for the 
observed effect (Kratochwill & Levin, in press). Furthermore, repeating an AB design across 
several cases in separate or independent studies would typically not allow for drawing valid 
inferences from the data (Note: this differs from multiple baseline designs, described below, 
which introduce the intervention at different points in time). The Standards require a minimum 
of four A and B phases, such as the ABAB design. 

There are three major classes of SCD that incorporate phase repetition, each of which can 
accommodate some form of randomization to strengthen the researcher’s ability to draw valid 
causal inferences (see Kratochwill & Levin, in press, for discussion of such randomization 
applications). These design types include the ABAB design (as well as the changing criterion 
design, which is considered a variant of the ABAB design), the multiple baseline design, and the 
alternating treatments design. Valid inferences associated with the ABAB design are tied to the 
design’s structured repetition. The phase repetition occurs initially during the first B phase, again 
in the second A phase, and finally in the return to the second B phase (Horner et al., 2005). This 
design and its effect replication standard can be extended to multiple repetitions of the treatment 
(e.g., ABAB ABAB) and might include multiple treatments in combination that are introduced in 
a repetition sequence as, for example, A/(B+C)/A/(B+C)/A (see Table 1). In the case of the 
changing criterion design, the researcher begins with a baseline phase and then schedules a series 
of criterion changes or shifts that set a standard for participant performance over time. The 
criteria are typically pre-selected and change is documented by outcome measures changing with 
the criterion shifts over the course of the experiment. 
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TABLE 1 



EXAMPLE SINGLE-CASE DESIGNS AND ASSOCIATED CHARACTERISTICS 



Representative Example Designs Characteristics 

Simple phase change designs [e.g., ABAB; BCBC In these designs, estimates of level, trend, and variability 
and the changing criterion design].* (In the within a data series are assessed under similar conditions; the 
literature, ABAB designs are sometimes referred manipulated variable is introduced and concomitant changes 
to as withdrawal designs, intrasubject replication in the outcome measure(s) are assessed in the level, trend, and 
designs, or reversal designs) variability between phases of the series, with special attention 

to the degree of overlap, immediacy of effect, and similarity 
of data patterns in similar phases (e.g., all baseline phases). 

Complex phase change [e.g., interaction element: In these designs, estimates of level, trend, and variability in a 

B(B+C)B; C(B+C)C] data series are assessed on measures within specific conditions 

and across time. 

Changing criterion design In this design the researcher examines the outcome measure to 

determine if it covaries with changing criteria that are 
scheduled in a series of predetermined steps within the 
experiment. An A phase is followed by a series of B phases 
(e.g., Bl, B2, B3...BT), with the Bs implemented with 
criterion levels set for specified changes. Changes/ differences 
in the outcome measure(s) are assessed by comparing the 
series associated with the changing criteria. 

In these designs, estimates of level, trend, and variability in a 
data series are assessed on measures within specific conditions 
and across time. Changes/differences in the outcome 
measure(s) are assessed by comparing the series associated 
with different conditions. 

Simultaneous treatments (in the literature In these designs, estimates of level, trend, and variability in a 
simultaneous treatment designs are sometimes data series are assessed on measures within specific conditions 
referred to as concurrent schedule designs). and across time. Changes/differences in the outcome 

measure(s) are assessed by comparing the series across 
conditions. 

Multiple baseline (e.g., across cases, across In these designs, multiple AB data series are compared and 
behaviors, across situations) introduction of the intervention is staggered across time. 

Comparisons are made both between and within a data series. 
Repetitions of a single simple phase change are scheduled, 
each with a new series and in which both the length and 
timing of the phase change differ across replications. 

Source: Adapted from Hayes (1981) and Kratochwill & Levin (in press). To be reproduced with permission. 

* A represents a baseline series; “B” and “C” represent two different intervention series. 



Alternating treatments (In the literature, alternating 
treatment designs are sometimes referred to as part 
of a class of multi-element designs) 
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Another variation of SCD methodology is the alternating treatments design, which relative 
to the ABAB and multiple baseline designs potentially allows for more rapid comparison of two 
or more conditions (Barlow & Hayes, 1979; Hayes, Barlow, & Nelson-Gray, 1999). In the 
typical application of the design, two separate interventions are alternated following the baseline 
phase. The alternating feature of the design occurs when, subsequent to a baseline phase, the 
interventions are alternated in rapid succession for some specified number of sessions or trials. 
As an example, Intervention B could be implemented on one day and Intervention C on the next, 
with alternating interventions implemented over multiple days. In addition to a direct comparison 
of two interventions, the baseline (A) condition could be continued and compared with each 
intervention condition in the alternating phases. The order of this alternation of interventions 
across days may be based on either counterbalancing or a random schedule. Another variation, 
called the simultaneous treatment design (sometimes called the concurrent schedule design), 
involves exposing individual participants to the interventions simultaneously, with the 
participant’s differential preference for the two interventions being the focus of the investigation. 
This latter design is used relatively infrequently in educational and psychological research, 
however. 

The multiple baseline design involves an effect replication option across participants, 
settings, or behaviors. Multiple AB data series are compared and introduction of the intervention 
is staggered across time. In this design, more valid causal inferences are possible by staggering 
the intervention across one of the aforementioned units (i.e., sequential introduction of the 
intervention across time). The minimum number of phase repetitions needed to meet the standard 
advanced by Homer et al. (2005) is three, but four or more is recognized as more desirable (and 
statistically advantageous in cases in which, for example, the researcher is applying a 
randomization statistical test). Adding phase repetitions increases the power of the statistical test, 
similar to adding participants in a traditional group design (Kratochwill & Levin, in press). The 
number and timing of the repetitions can vary, depending on the outcomes of the intervention. 
For example, if change in the dependent variable is slow to occur, more time might be needed to 
demonstrate experimental control. Such a circumstance might also reduce the number of phase 
repetitions that can be scheduled due to cost and logistical factors. Among the characteristics of 
this design, effect replication across series is regarded as the characteristic with the greatest 
potential for enhancing internal and statistical-conclusion validity (see, for example, Levin, 
1992). 

Well-structured SCD research that embraces phase repetition and effect replication can rule 
out major threats to internal validity. The possible threats to internal validity in single-case 
research include the following (see also Shadish et al., 2002, p. 55): 



1. Ambiguous Temporal Precedence: Lack of clarity about which variable occurred 
first may yield confusion about which variable is the cause and which is the effect. 
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Embedded in the SCD Standards is a criterion that the independent variable is actively 
manipulated by the researcher, with measurement of the dependent variable occurring after that 
manipulation. This sequencing ensures the presumed cause precedes the presumed effect. A SCD 

3 

cannot meet Standards unless there is active manipulation of the independent variable. 

Replication of this manipulation-measurement sequence in the experiment further 
contributes to an argument of unidirectional causation (Shadish et al., 2002). Effect replication, 
as specified in the Standards, can occur either through within-case replication or multiple-case 
replication in a single experiment, or by conducting two or more experiments with the same or 
highly similar intervention conditions included. The Standards specify that the study must show 
a minimum of three demonstrations of the effect through the use of the same design and 
procedures. Overall, studies that can meet standards are designed to mitigate the threat of 
ambiguous temporal precedence. 



2. Selection: Systematic differences between/among conditions in participant 

characteristics could cause the observed effect. 



In most single-case research, selection is generally not a concern because one participant is 
exposed to both (or all) of the conditions of the experiment (i.e., each case serves as its own 
control, as noted in features for identifying a SCD in the Standards). However, there are some 
conditions under which selection might affect the design’s internal validity. First, in SCDs that 
involve two or more between-case intervention conditions comprised of intact “units” (e.g., 
pairs, small groups, and classrooms), differential selection might occur. The problem is that the 
selected units might differ in various respects before the study begins. Because in most single- 
case research the units are not randomly assigned to the experiment’s different intervention 
conditions, selection might then be a problem. This threat can further interact with other 
invalidating influences so as to confound variables (a methodological soundness problem) and 
compromise the results (an evidence credibility problem). Second, the composition of intact units 
(i.e., groups) can change (generally decrease in size, as a result of participant attrition) over time 
in a way that could compromise interpretations of a treatment effect. This is a particular concern 
when within-group individual participants drop out of a research study in a treatment-related 
(nonrandom) fashion (see also No. 6 below). The SCD Standards address traditional SCDs and 
do not address between-case group design features (for Standards for group designs, see the 
WWC Handbook). Third, in the multiple baseline design across cases, selection might be an 
issue when different cases sequentially begin the intervention based on “need” rather than on a 
randomly determined basis (e.g., a child with the most serious behavior problem among several 
candidate participants might be selected to receive the treatment first, thereby weakening the 
study’s external validity). 



3 Manipulation of the independent variable is usually either described explicitly in the Method section of 
the text of the study or inferred from the discussion of the results. Reviewers will be trained to identify cases in 
which the independent variable is not actively manipulated and in that case, a study Does Not Meet Standards. 
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3. History: Events occurring concurrently with the intervention could cause the 
observed effect. 



History is typically the most important threat to any time series, including SCDs. This is 
especially the case in ex post facto single-case research because the researcher has so little ability 
to investigate what other events might have occurred in the past and affected the outcome, and in 
simple (e.g., ABA) designs, because one need find only a single plausible alternative event about 
the same time as treatment. The most problematic studies, for example, typically involve 
examination of existing databases or archived measures in some system or institution (such as a 
school, prison, or hospital). Nevertheless, the study might not always be historically confounded 
in such circumstances; the researcher can investigate the conditions surrounding the treatment 
and build a case implicating the intervention as being more plausibly responsible for the 
observed outcomes relative to competing factors. Even in prospective studies, however, the 
researcher might not be the only person trying to improve the outcome. For instance, the patient 
might make other outcome -related changes in his or her own life, or a teacher or parent might 
make extra-treatment changes to improve the behavior of a child. SCD researchers should be 
diligent in exploring such possibilities. However, history threats are lessened in single-case 
research that involves one of the types of phase repetition necessary to meet standards (e.g., the 
ABAB design discussed above). Such designs reduce the plausibility that extraneous events 
account for changes in the dependent variable(s) because they require that the extraneous events 
occur at about the same time as the multiple introductions of the intervention over time, which is 
less likely to be true than is the case when only a single intervention is done. 



4. Maturation: Naturally occurring changes over time could be confused with an 
intervention effect. 



In single-case experiments, because data are gathered across time periods (for example, 
sessions, days, weeks, months, or years), participants in the experiment might change in some 
way due to the passage of time (e.g., participants get older, learn new skills). It is possible that 
the observed change in a dependent variable is due to these natural sources of maturation rather 
than to the independent variable. This threat to internal validity is accounted for in the Standards 
by requiring not only that the design document three replications/demonstrations of the effect, 
but that these effects must be demonstrated at a minimum of three different points in time. As 
required in the Standards, selection of an appropriate design with repeated assessment over time 
can reduce the probability that maturation is a confounding factor. In addition, adding a control 
series (i.e., an A phase or control unit such as a comparison group) to the experiment can help 
diagnose or reduce the plausibility of maturation and related threats (e.g., history, statistical 
regression). For example, see Shadish and Cook (2009). 



5. Statistical Regression (Regression toward the Mean): When cases (e.g., single 
participants, classrooms, schools) are selected on the basis of their extreme scores, 
their scores on other measured variables (including re-measured initial variables) 
typically will be less extreme, a psychometric occurrence that can be confused with 
an intervention effect. 
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In single-case research, cases are often selected because their pre-experimental or baseline 
measures suggest high need or priority for intervention (e.g., immediate treatment for some 
problem is necessary). If only pretest and posttest scores were used to evaluate outcomes, 
statistical regression would be a major concern. However, the repeated assessment identified as a 
distinguishing feature of SCDs in the Standards (wherein performance is monitored to evaluate 
level, trend, and variability, coupled with phase repetition in the design) makes regression easy 
to diagnose as an internal validity threat. As noted in the Standards, data are repeatedly collected 
during baseline and intervention phases and this repeated measurement enables the researcher to 
examine characteristics of the data for the possibility of regression effects under various 
conditions. 



6. Attrition: Loss of respondents during a single-case time-series intervention study can 
produce artifactual effects if that loss is systematically related to the experimental 
conditions. 



Attrition (participant dropout) can occur in single-case research and is especially a concern 
under at least three conditions. First, premature departure of participants from the experiment 
could render the data series too short to examine level, trend, variability, and related statistical 
properties of the data, which thereby may threaten data interpretation. Hence, the Standards 
require a minimum of five data points in a phase to meet evidence standards without 
reservations. Second, attrition of one or more participants at a critical time might compromise the 
study’s internal validity and render any causal inferences invalid; hence, the Standards require a 
minimum of three phase repetitions to meet evidence standards. Third, in some single-case 
experiments, intact groups comprise the experimental units (e.g., group-focused treatments, 
teams of participants, and classrooms). In such cases, differential attrition of participants from 
one or more of these groups might influence the outcome of the experiment, especially when the 
unit composition change occurs at the point of introduction of the intervention. Although the 
Standards do not automatically exclude studies with attrition, reviewers are asked to attend to 
attrition when it is reported. Reviewers are encouraged to note that attrition can occur when (1) 
an individual fails to complete all required phases of a study, (2) the case is a group and 
individuals attrite from the group or (3) the individual does not have adequate data points within 
a phase. Reviewers should also note when the researcher reports that cases were dropped and 
record the reason for that (for example, being dropped for nonresponsiveness to treatment). To 
monitor attrition through the various phases of single-case research, reviewers are asked to apply 
a template embedded in the coding guide similar to the flow diagram illustrated in the 
CONSORT Statement (Moher, Schulz, & Altman, 2001) and adopted by the American 
Psychological Association for randomized controlled trials research (APA Publications and 
Communications Board Working Group on Journal Article Reporting Standards, 2008). See 
Appendix A for the WWC SCD attrition diagram. Attrition noted by reviewers should be 
brought to the attention of principal investigators (Pis) to assess whether the attrition may impact 
the integrity of the study design or evidence that is presented. 



7. Testing: Exposure to a test can affect scores on subsequent exposures to that test, an 
occurrence that can be confused with an intervention effect. 
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In SCDs, there are several different possibilities for testing effects — in particular, many 
measurements are likely to be “reactive” when administered repeatedly over time. For example, 
continuous exposure of participants to some curriculum measures might improve their 
performance over time. Sometimes the assessment process itself influences the outcomes of the 
study, such as when direct classroom observation causes change in student and teacher 
behaviors. Strategies to reduce or eliminate these influences have been proposed (Cone, 2001). 
In single-case research, the repeated assessment of the dependent variable(s) across phases of the 
design can help identify this potential threat. The effect replication standard can enable the 
researcher to reduce the plausibility of a claim that testing per se accounted for the intervention 
effect (see Standards ). 



8. Instrumentation: The conditions or nature of a measure might change over time in a 
way that could be confused with an intervention effect. 



Confounding due to instrumentation can occur in single-case research when changes in a 
data series occur as a function of changes in the method of assessing the dependent variable over 
time. One of the most common examples occurs when data are collected by assessors who 
change their method of assessment over phases of the experiment. Such factors as reactivity, 
drift, bias, and complexity in recording might influence the data and implicate instrumentation as 
a potential confounding influence. Reactivity refers to the possibility that observational scores 
are higher as a result of the researcher monitoring the observers or observational process. 
Observer drift refers to the possibility that observers may change their observational definitions 
of the construct being measured over time, thereby not making scores comparable across phases 
of the experiment. Observational bias refers to the possibility that observers may be influenced 
by a variety of factors associated with expected or desired experimental outcomes, thereby 
changing the construct under assessment. Complexity may influence observational assessment in 
that more complex observational codes present more challenges than less complex codes with 
respect to obtaining acceptable levels of observer agreement. Numerous recommendations to 
control these factors have been advanced and can be taken into account (Hartmann, Barrios, & 
Wood, 2004; Kazdin, 1982). 



9. Additive and Interactive Effects of Threats to Internal Validity: The impact of a 
threat can be added to that of another threat or may be moderated by levels of another 
threat. 



In SCDs the aforementioned threats to validity may be additive or interactive. Nevertheless, 
the “Criteria for Designs that Meet Evidence Standards” and the “Criteria for Demonstrating 
Evidence of a Relation between an Independent and an Outcome Variable” have been crafted 
largely to address the internal validity threats noted above. Further, reviewers are encouraged to 
follow the approach taken with group designs, namely, to consider other confounding factors that 
might have a separate effect on the outcome variable (i.e., an effect that is not controlled for by 
the study design). Such confounding factors should be discussed with Pis to determine whether 
the study Meets Standards. 
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D. THE SINGLE-CASE DESIGN STANDARDS 



The PI within each topic area will: (1) define the independent and outcome variables under 
investigation, 4 (2) establish parameters for considering fidelity of intervention implementation, 5 
and (3) consider the reasonable application of the Standards to the topic area and specify any 
deviations from the Standards in that area protocol. For example, when measuring self-injurious 
behavior, a baseline phase of fewer than five data points may be appropriate. Pis might need to 
make decisions about whether the design is appropriate for evaluating an intervention. For 
example, an intervention associated with a permanent change in participant behavior should be 
evaluated with a multiple baseline design rather than an ABAB design. Pis will also consider the 
various threats to validity and how the researcher was able to address these concerns, especially 
in cases in which the Standards do not necessarily mitigate the validity threat in question (e.g., 
testing, instrumentation). Note that the SCD Standards apply to both observational measures and 
standard academic assessments. Similar to the approach with group designs, Pis are encouraged 
to define the parameters associated with “acceptable” assessments in their protocols. For 
example, repeated measures with alternate forms of an assessment may be acceptable and WWC 
psychometric criteria would apply. Pis might also need to make decisions about particular 
studies. Several questions will need to be considered, such as: (a) Will generalization variables 
be reported? (b) Will follow-up phases be assessed? (c) If more than one consecutive baseline 
phase is present, are these treated as one phase or two distinct phases? and (d) Are multiple 
treatments conceptually distinct or multiple components of the same intervention? 



SINGLE-CASE DESIGN STANDARDS 



These Standards are intended to guide WWC reviewers in identifying and evaluating SCDs. 
The first section of the Standards assists with identifying whether a study is a SCD. As depicted 
in Figure 1, a SCD should be reviewed using the ‘Criteria for Designs that Meet Evidence 
Standards’, to determine those that Meet Evidence Standards, those that Meet Evidence 
Standards with Reservations, and those that Do Not Meet Evidence Standards. 

Studies that meet evidence standards (with or without reservations) should then be reviewed 
using the ‘Criteria for Demonstrating Evidence of a Relation between an Independent Variable 
and a Dependent Variable’ (see Figure l). 6 This review will result in a sorting of SCD studies 
into three groups: those that have Strong Evidence of a Causal Relation, those that have 
Moderate Evidence of a Causal Relation, and those that have No Evidence of a Causal Relation. 



4 Because SCDs are reliant on phase repetition and effect replication across participants, settings, and 
researchers to establish external validity, specification of the intervention materials, procedures, and context of the 
research is particularly important within these studies (Horner et al., 2005). 

5 Because interventions are applied over time, continuous measurement of implementation is a relevant 
consideration. 

6 This process results in a categorization scheme that is similar to that used for evaluating evidence 
credibility by inferential statistical techniques (hypothesis testing, effect-size estimation, and confidence-interval 
construction) in traditional group designs. 
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FIGURE 1 



PROCEDURE FOR APPLYING SCD STANDARDS: FIRST EVALUATE DESIGN, 
THEN IF APPLICABLE, EVALUATE EVIDENCE 
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